CUDA 核心開發始於定義一個 核心,這是一種專為在具備龐大核心數量的 NVIDIA GPUC++ 函數,可並行執行。這些函數是 CUDA 程式設計模型中的基本工作單位,扮演著串列主機邏輯轉換為大量平行設備執行的橋樑。
1. __global__ 修飾符
這個 __global__ 宣告修飾符是必須的 API 修飾符,它會指示編譯器為 GPU 產生程式碼,同時讓函數入口點對 CPU 保持可見。 可在 GPU 上執行且能由主機呼叫的函數稱為核心。
2. 執行環境
核心被分派至並在 串流多處理器(SM)上執行。SM 是 NVIDIA GPU 內的主要計算引擎,負責管理數百個並行執行的線程。每個 SM 處理線程區塊,並將其排程至處理核心。
語法規則: 核心必須嚴格地返回 void。由於它們與主機非同步運作,無法直接將值回傳給 CPU;必須將結果寫回到已配置的裝置記憶體中。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What is the primary function of the
__global__ specifier?It defines a function that runs on the CPU but is callable from the GPU.
It defines a kernel that runs on the GPU and is callable from the CPU.
It allocates memory on the GPU's SM cache.
It synchronizes all threads in a block.
✅ Correct!
Correct! __global__ is the bridge used to launch kernels from Host code.❌ Incorrect
Incorrect. __global__ specifically identifies entry-point kernels for GPU execution called by the Host.QUESTION 2
Why must CUDA kernels return
void?Because they execute asynchronously and have no direct path to return values to the Host thread.
To save registers on the SM.
Because GPU memory is read-only.
The NVCC compiler does not support float returns.
✅ Correct!
Exactly. Since kernels launch asynchronously, the Host doesn't wait for a return value; results must be written to pointers.❌ Incorrect
The return value restriction is due to the asynchronous nature of GPU execution and Host-Device separation.QUESTION 3
Which hardware component is responsible for managing and executing threads in a CUDA kernel?
The PCIe Controller.
The Streaming Multiprocessor (SM).
The Host RAM controller.
The BIOS.
✅ Correct!
Yes, the SM is the core unit on the GPU where threads are scheduled and executed.❌ Incorrect
The SM (Streaming Multiprocessor) is the heart of GPU compute hardware.QUESTION 4
What happens when a Host calls a kernel function?
The CPU halts until the GPU finish processing.
The GPU creates a clone of the function for every available SM.
The kernel is enqueued for execution on the GPU, and the CPU continues to the next instruction.
The CPU performs a context switch to the GPU.
✅ Correct!
Correct. Kernel launches are non-blocking (asynchronous) from the Host's perspective.❌ Incorrect
CUDA kernels are asynchronous; the CPU does not wait unless explicitly told to do so via synchronization.QUESTION 5
Which of the following is the correct definition of a CUDA kernel?
A function that executes on the GPU and is invoked from the Host.
A C++ library for file I/O.
A hardware driver for NVIDIA GPUs.
A standard CPU function with the __gpu__ prefix.
✅ Correct!
Perfect. This is the fundamental definition of a kernel in CUDA programming.❌ Incorrect
A kernel is specifically the code designed to run on the GPU while being launched from the Host.Module Challenge: Designing a Vector Subtraction Kernel
Applying kernel fundamentals to data transformation.
You are tasked with porting a signal processing routine to the GPU. The core operation subtracts background noise (Vector B) from a signal (Vector A) into a result (Vector C).
Q
1. Write the function signature for a kernel named 'vecSub' that takes three float pointers.
Solution:
__global__ void vecSub(float* A, float* B, float* C)Q
2. In which hardware unit will the logic inside your 'vecSub' kernel physically reside during execution?
Solution:
The logic will be executed on the Streaming Multiprocessors (SMs) of the NVIDIA GPU.
The logic will be executed on the Streaming Multiprocessors (SMs) of the NVIDIA GPU.
Q
3. If you attempt to return a float status code from this kernel, why will the compiler throw an error?
Solution:
CUDA Kernels must have a
CUDA Kernels must have a
void return type because they are executed asynchronously. Any status reporting or data return must be performed through device memory pointers.